Vector and Matrix Calculus
Table of Contents
Scalar Field | Vector Field | |
---|---|---|
0th Derivative | \( f \) | \(\mathbf{f}\) |
1st Derivative | \(\nabla f\) Gradient | \(J(\mathbf{f})\) Jacobian |
\(\supset \nabla\cdot\mathbf{f}\) Divergence and \(\nabla\times\mathbf{f}\) Curl | ||
2nd Derivative | \(H(f)\) Hessian | |
\(\supset\nabla^2f\) Laplacian |
- The \(\nabla\cdot\) and \(\nabla\times\) is the formal product. So the regular rule for dot product and cross product may not apply.
- \(\nabla\) is not compatible with dot and cross product, that is, \( \mathbf{v}\cdot (\nabla \mathbf{w}) \neq (\mathbf{v}\cdot \nabla) \mathbf{w}. \)
1. Gradient
- Vector field that represents the rate of change in a space.
1.1. Definition
- For a morphism \(f\colon X\to Y\), the gradient \(\nabla f\colon X\to Z\) is a linear map, such that \[ dy = \langle \nabla f, dx\rangle \] in which bilinear map \(\langle \cdot, \cdot \rangle\colon Z\times X \to Y\) is well-defined.
1.1.1. Orthogonal Curvilinear Coordinate System
- \[ \nabla f = \frac{1}{h_i}\frac{\partial f}{\partial x^i} \mathbf{e}_i \]
- where \[ h_i = \left\| \frac{\partial \mathbf{r}}{\partial \tilde{x}^i}\right\|. \]
1.2. Properties
- This can also be written in terms of differential form as: \[ dy = df(dx). \]
1.3. Formulae
2. Divergence
2.1. Definition
- Divergence of a vector field \(\mathbf{F}\) is \[ \nabla\cdot \mathbf{F} = \frac{\partial F_{x_i}}{\partial {x_i}}. \]
2.1.1. Orthogonal Curvilinear Coordinate System
- \[ \nabla\cdot \mathbf{F} = \frac{1}{\prod_j h_j}\left(\frac{\partial}{\partial x^i}\prod_{j\neq i}h_jF^i\right) \] where \[ h_i = \left\| \frac{\partial \mathbf{r}}{\partial \tilde{x}^i}\right\|. \]
2.2. Interpretation
- The net flux through a unit volume.
- The rate of change of the ratio of volume (the ratio of the rate of
change in volume, rate of change of a unit volume) subjected to the
flow of a vector field.
- For a vector field given by a linear transformation:
- \[ \nabla\cdot(\mathbf{Ax}) = \frac{d}{dt}\ln V \]
- The infinitesimal transformation generated by the vector field
\(\mathbf{F}\) is:
\[
\tilde{x}^i = x^i + F^idt
\]
- The Jacobian of the transformation would be: \[ J_i^j = \begin{bmatrix} 1 + \partial_{x^1}F^1dt & \partial_{x^2}F^1dt & \cdots & \partial_{x^n}F^1dt \\ \partial_{x^1}F^2dt & 1 + \partial_{x^2}F^2dt & \cdots & \partial_{x^n}F^1dt \\ \vdots & \vdots & \ddots & \vdots \\ \partial_{x^1}F^ndt & \partial_{x^2}F^ndt & \cdots & 1+ \partial_{x^n}F^ndt \\ \end{bmatrix} \]
- And the determinant is: \[ \det J_i^j = 1 + \nabla\cdot \mathbf{F}\,dt + O(dt^2) \]
- By taking the derivative of that: \[ \frac{d}{dt} \det J_i^j = \nabla\cdot \mathbf{F} \]
- For a vector field given by a linear transformation:
3. Curl
3.1. Generalization
3.1.1. Orthogonal Curvilinear Coordinate System
- \[ \nabla\times \mathbf{F} = \frac{1}{h_1h_2h_3}\begin{vmatrix} h_1\tilde{\mathbf{e}}_1 & h_2\tilde{\mathbf{e}}_2 & h_3\tilde{\mathbf{e}}_3 \\[.5em] \dfrac{\partial}{\partial \tilde{x}^1} & \dfrac{\partial}{\partial \tilde{x}^2} & \dfrac{\partial}{\partial \tilde{x}^3} \\[1em] h_1\tilde{F}^1 & h_2\tilde{F}^2 & h_3\tilde{F}^3 \\ \end{vmatrix} \]
- where \[ h_i = \left\| \frac{\partial \mathbf{r}}{\partial \tilde{x}^i}\right\|. \]
3.1.2. General Coordinate System
- \[
(\nabla \times \mathbf{F} )^k = \frac{1}{\sqrt{g}} \varepsilon^{k\ell m} (\nabla_\ell \mathbf{F})_m
\]
- using the covariant derivative.
- By the symmetry of the Christoffel symbols , \[ (\nabla \times \mathbf{F} ) = \frac{1}{\sqrt{g}} \mathbf{e}_k\varepsilon^{k\ell m} \partial_\ell F_m \]
3.1.3. Differential Form
- \[ \left(\star(\mathrm{d}\mathbf{F}^\flat)\right)^\sharp \]
- where \(\flat\) and \(\sharp\) are the musical isomorphisms that takes the basis vectors into corresponding basis 1-forms.
4. Laplacian
4.1. Definition
- \[ \nabla^{\cdot 2} f = \nabla\cdot\nabla f \]
- \(\nabla^2\) is used in physics, and \(\Delta\) is used in mathematics.
4.2. Properties
- Divergence of Gradient
- Trace of the Hessian .
5. Jacobian
- Transformation between curvilinear coordinate systems.
5.1. Definition
A Jacobian matrix of a vector field \(\mathbf{f}\) is \[ J^{i}{}_{j}=\frac{\partial f^i}{\partial x^j} \] where \(i\) is the row number and \(j\) is the column number.
It tells the rate of change in the vector field in any direction. Consider the identity: \( \mathrm{d}f^i=J_{j}^{i}\mathrm{d}x^j \) or equivalently, \( \mathrm{d}\mathbf{f}=\mathbf{J}\mathrm{d}\mathbf{x} \).
Beware that some people prefer to use the transpose of this Jacobian as their Jacobian.
5.2. Inverse
\[ J^{-1}{}^i{}_j := \frac{\partial x^i}{\partial f^j} \] The inverse matrix can also be written concisely as \[ J^{-1}{}^i{}_j = J_j{}^i. \]
- Reciprocate each element and transpose the Jacobian matrix.
5.3. Change of Basis
A Jacobian of coordinate transformation from coordinates \(x^j\) to coordinates \(\tilde{x}^i\) is \[ J^i{}_j=\frac{\partial \tilde{x}^i}{\partial x^j} \] which transforms the components.
To transform the basis, the inverse Jacobian is used. \[ \frac{\partial}{\partial \tilde{x}^j}=J_j{}^i\frac{\partial}{\partial x^i} \] equivalently, \[ \begin{bmatrix}\tilde{\mathbf{e}}_{1}&\tilde{\mathbf{e}}_{2}&\cdots&\tilde{\mathbf{e}}_{n}\end{bmatrix}=\begin{bmatrix}\mathbf{e}_{1}&\mathbf{e}_{2}&\cdots&\mathbf{e}_{n}\end{bmatrix}\mathbf{J}^{-1}. \]
\(\mathbf{J} : TM \to TN\) | \(TM \to TN\) | \(TN \to TM\) |
Covariant | \(\mathbf{J}^{-1}\) | \(\mathbf{J}\) |
Contravariant | \(\mathbf{J}\) | \(\mathbf{J}^{-1}\) |
5.4. Determinant
The determinant of the Jacobian is the ratio of volumes due to transformation. Thus used as the factor in the change of the measure of an integral.
6. Hessian
6.1. Definition
Hessian \(\mathbf{H}\) of a twice-differentiable scalar field \(f\) is: \[ H_{ij} = \frac{\partial^2 f}{\partial x^i\partial x^j}. \]
6.2. Properties
- Hessian matrix is the transpose of the Jacobian matrix of the gradient.
- excalidraw:./hessian.excalidraw
- \((\mathrm{d}\mathbf{x})^{\rm T}\mathbf{H}[f]\mathrm{d}\mathbf{x} = (\mathrm{d}\nabla f)^{\rm T}\mathrm{d}\mathbf{x}.\)
- If it is evaluated at a stationary point, then \(\mathrm{d}\nabla f\) would point in the direction of the gradient \(\nabla f\).
- Notice that \(\nabla f\) is the normal map, namely, a Gauss map.
- If the Hessian is positive-definite at \(\mathbf{x}\), then \(f\) attains an isolated local mimimum at \(\mathbf{x}\), by the same note, if the Hessian is negative-definite, then \(f\) attains an isolated local maximum.
7. Identities
8. Notations
There exists two main notational convention in taking derivative with respect to a vector or a matrix: numerator layout convention and denominator layout convention. They have their own advantages and disadvantages, and some even mix and match them. It is generally recommended to follow the layout of the textbook presented.
The numerator layout treats the vector in the numerator as a column vector, and the vector in the denominator as a row vector. For example, \[ \frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n} \\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n} \\ \vdots & \vdots &\ddots & \vdots \\ \frac{\partial y_n}{\partial x_1} & \frac{\partial y_n}{\partial x_2} & \cdots & \frac{\partial y_n}{\partial x_n} \\ \end{bmatrix}. \] which matches the layout of the standard Jacobian.
Similarly, the denominator layout treats the vector in the numerator as a row vector, and the vector in the numerator as a column vector. For example, \[ \frac{\partial f}{\partial \mathbf{x}} = \begin{bmatrix} \frac{\partial f}{\partial x_1} \\ \frac{\partial f}{\partial x_2} \\ \vdots \\ \frac{\partial f}{\partial x_n} \\ \end{bmatrix} \] which matches the layout of the standard gradient.
A matrix can be used in either the numerator or denominator, but not both. When a matrix in in the denominator, it is treated as the transpose of itself. In these matrix calculus notation, tensors whose ranks are higher than 2 is not the subject of interest.
This notation is just for convenience. See Matrix calculus - Wikipedia for more.
9. Derivative
9.1. Leibniz rule
\[ \frac{d}{dx}(\mathbf{A}\mathbf{B}) = \frac{d\mathbf{A}}{dx}\mathbf{B} + \mathbf{A}\frac{d\mathbf{B}}{dx} \]
9.2. Of Inverse Matrix
\[\frac{d\mathbf{A}^{-1}}{dx}=-\mathbf{A}^{-1}\frac{d\mathbf{A}}{dx}\mathbf{A}^{-1}\]
10. Exponential
- \[ e^{\mathbf{A}} := \sum_{n=0}^\infty \frac{\mathbf{A}^n}{n!}. \]
10.1. Properties
- \[ \mathbf{A}\mathbf{B} = \mathbf{B}\mathbf{A} \iff e^{\mathbf{A}}e^\mathbf{B} = e^{\mathbf{A}+\mathbf{B}} \]
- \[ e^\mathbf{O} = \mathbf{I},\quad \left(e^\mathbf{A}\right)^{-1} = e^{-\mathbf{A}},\quad \left(e^\mathbf{A}\right)^n = e^{n\mathbf{A}} \]
- \[ \left(e^{\mathbf{A}}\right)^{\mathrm T} = e^{\mathbf{A}^\mathrm{T}}, \quad \operatorname{det}\left(e^\mathbf{A}\right) = e^{\operatorname{tr}(\mathbf{A})} \]
- If \(\mathbf{A}\) is diagonalizable: \[ e^{\mathbf{A}} = \mathbf{V}e^{\mathbf{\Lambda}}\mathbf{V}^{-1}. \]
- The solution to the differential equation: \[ \mathbf{y}' = \mathbf{A}\mathbf{y} \] is the matrix exponential: \[ e^{\mathbf{A}t}\mathbf{y}_0 \] for any square matrix \(\mathbf{A}\).
11. Jacobi's Formula
- Complementary to the Liouville's formula.
11.1. Formula
- \[ \frac{d}{dt}\det \mathbf{A}(t) = \operatorname{tr}\left(\operatorname{adj}(\mathbf{A}(t))\frac{d\mathbf{A}(t)}{dt}\right) \] where \(\operatorname{adj}\) is the adjugate matrix.
- If \(\mathbf{A}\) is invertible, it can further be said to be \[ \frac{d}{dt}\det\mathbf{A} = \det(\mathbf{A}(t)) \operatorname{tr}\left(\mathbf{A}^{-1}(t)\frac{d}{dt}\mathbf{A}(t)\right) \]
11.2. Properties
- This means
- \[
\frac{\partial \det\mathbf{A}}{\partial A_{ij}} = (\operatorname{adj}\mathbf{A})_{ji} = (\mathbf{C})_{ij},
\]
- where \(\mathbf{C}\) is the cofactor matrix;
- \[
d\det(\mathbf{A}) = \operatorname{tr}(\operatorname{adj}(\mathbf{A})\,d\mathbf{A}) = \langle (\operatorname{adj}\mathbf{A})^{\rm T}, d\mathbf{A}\rangle_{\rm F},
\]
- where \(\langle \cdot,\cdot\rangle\) is the Frobenius inner product;
- \[
\nabla \operatorname{det}(\mathbf{A}) = (\operatorname{adj}\mathbf{A})^{\rm T} = \mathbf{C},
\]
- where \(\nabla\) is the gradient.
- \[
\frac{\partial \det\mathbf{A}}{\partial A_{ij}} = (\operatorname{adj}\mathbf{A})_{ji} = (\mathbf{C})_{ij},
\]